First passage times in homogeneous nucleation and self-assembly 
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Motivated by nucleation and molecular aggregation in physical, chemical and biological settings, 
we present a thorough analysis of the general problem of stochastic self-assembly of a fixed number 
of identical particles in a finite volume. We derive the Backward Kolmogorov equation (BKE) for 
the cluster probability distribution. From the BKE we study the distribution of times it takes for 
a single maximal cluster to be completed, starting from any initial particle configuration. In the 
limits of slow and fast self-assembly, we develop analytical approaches to calculate the mean cluster 
formation time and to estimate the first assembly time distribution. We find, both analytically 
and numerically, that faster detachment can lead to a shorter mean time to first completion of a 
maximum-sized cluster. This unexpected effect arises from a redistribution of trajectory weights 
such that upon increasing the detachment rate, paths that take a shorter time to complete a cluster 
become more likely. 

PACS numbers: 02.50.Ga, 82.60.Nh, 87.10.Mn, 87.10.Rt 



I. INTRODUCTION 

The self-assembly of macromolecules and particles is a 
fundamental processes in many physical and chemical 
systems. Although particle nucleation and assembly have 
been studied for many decades, interest in this field has 
recently intensified due to engineering, biotechnological 
and imaging advances at the nanoscale levelir— . Aggre- 
gating atoms and molecules can lead to the design of 
new materials useful for surface coatings^, electronics 5 -, 
drug delivery^ and catalysis^. Examples include the self- 
assembly of DNA structures^ into polyhedral nanocap- 
sules useful for transporting drugs 1 ^ or the self-assembly 
of semiconducting quantum dots to be used as quantum 
computing bitsii. 

Other important examples of molecular self-assembly 
may be found in cell physiology or virology where pro- 
teins aggregate to form ion channels, viral capsids and 
plaques implicated in neurological diseases. One example 
is the rare self-assembly of fibrous protein aggregates such 
as f3— amyloid that have long been suspected to play a 
role in neurodegenerative conditions such as Alzheimer's, 
Parkinson's, and Huntington's disease^. In prion dis- 
eases, individual PrP c proteins misfold into PrP Sc pri- 
ons which subsequently self-assemble into fibrils. The 
aggregation of misfolded proteins in neurodegenerative 
diseases is a rare event, usually involving a very low con- 
centration of prions. Fibril nucleation also appears to 
occur slowly; however once a critical size of about ten 
proteins is reached, the fibril stabilizes and the growth 
process accelerates^. 

Viral proteins may also self-assemble to form capsid 
shells in the form of helices, icosahedral, dodecahedra, 
depending on virus type. A typical assembly process will 
involve several steps where dozens of dimcrs aggregate to 
form more complex subunits which later cooperatively 
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FIG. 1: Homogeneous self-assembly and growth in a closed 
unit volume initiated with M = 30 free monomers. At a spe- 
cific intermediate time < t < t* in this depicted realization, 
there are six free monomers, four dimers, four trimers, and 
one cluster of size four. For each realization of this process, 
there will be a specific time t* at which a maximum cluster 
of size N — 6 in this example is first formed (blue cluster) . 



assemble into the capsid shell. Usually, capsid formation 
requires hundreds of protein subunits that self-assemble 
over a period of seconds to hours, depending on experi- 
mental condition s 14 ' 15 . 

In addition to these two examples, many other biolog- 
ical processes involve a fixed "maximum" cluster size - 
of tens or hundreds of units - at which the process is 
completed or beyond which the dynamics change^. At 
times, the assembly process may involve coagulation and 
fragmentation of clusters as well, such as in the case of 
telomere aggregation in the yeast nucleus^. Developing 
a stochastic self-assembly model with a fixed "maximum" 
cluster size is thus important for our understanding of a 
large class of biological phenomena. 

Theoretical models for self-assembly have typically de- 
scribed mean-field concentrations of clusters of all pos- 
sible sizes using the well-studied mass-action, Becker- 
Doring equations^—. While Master equations for 
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the fully stochastic nucleation and growth problem 
have been derived, and initial analyses and simulations 
performed^— , there has been relatively less work on 
the stochastic self-assembly problem. We have recently 
shown that in finite systems, where the maximum cluster 
size is capped, results from mass-action equations are in- 
accurate and that in this case a discrete stochastic treat- 
ment is necessary^. 

In our previous examination of equilibrium cluster size 
distributions derived from a discrete, stochastic model^, 
we found that a striking finite-size effect arises when 
the total mass is not divisible by the maximum clus- 
ter size. In particular, we identified the discreteness of 
the system as the major source of divergence between 
mean-field, mass action equations and the fully stochas- 
tic model. Moreover, discrepancies between the two ap- 
proaches are most apparent in the strong binding limit 
where monomer detachment is slow. Before the system 
reaches equilibrium, or when the detachment is apprecia- 
ble, the differences between the mean-field and stochastic 
results are qualitatively similar, with only modest quan- 
titative disparities. 

In this paper, we will be interested in the distribution 
of the first assembly times towards the completion of a 
full cluster, which can only be determined through a fully 
stochastic treatment. Specifically, we wish to compute 
the time it takes for a system of M monomers to first 
assemble into a complete cluster of size N. We do not 
consider coagulation and fragmentation events, but, as a 
starting point, focus on attachment and detachment of 
single monomers. Statistics of the first assembly time^ 
may shed light on how frequently fast-growing protein 
aggregates appear. In principle, one may also estimate 
mean self-assembly times starting from the mean-field, 
mass action equations, using heuristic arguments. We 
will show however that these mean-field estimates yield 
mean first assembly times that are quite different from 
those obtained via exact, stochastic treatments. 

In the next section, we review the Bcckcr-Doring mass- 
action equations for self-assembly and motivate the for- 
mulation of approximate expressions for the first assem- 
bly time distributions. These will be shown to be poor 
estimates of the true distribution functions, leading us 
to consider the full stochastic problem in Section III. 
Here, we derive the Backward Kolmogorov equation as- 
sociated with the self assembly process and illustrate how 
to formally solve it through the corresponding eigenvalue 
problem. In Section IV, we explore three limits of the 
stochastic self-assembly process and derive analytic ex- 
pressions for the mean first assembly time in the strong 
and weak binding limits. Results from kinetic Monte- 
Carlo (KMC) simulations are presented in Section V and 
compared with our analytical estimates. Finally, we dis- 
cuss the implications of our results and propose further 
extensions in the Summary and Conclusions. 



II. MASS-ACTION MODEL OF 
HOMOGENEOUS NUCLEATION AND 
SELF-ASSEMBLY 

The classic mass-action description for spontaneous, ho- 
mogeneous self-assembly is the Bccker-Doring model2£, 
where the concentrations Ck(t) of clusters of size k obey 

N-l N 

ci(t) = -pic\ - cx PjCj + 2q 2 c 2 + ^2 1j c j 

i=2 3=3 
C%{t) = = -P2ClC 2 + - q 2 C 2 + <73C3 

Ck(t) = -PkCiCk +Pk-iC\C k -i - qkCk + qk+iCk+i 

Cjv(t) = PN-\C\C N -i - qNCN, (f) 

where pk and qk are the monomer attachment and de- 
tachment rates to and from a cluster of size k. A typical 
initial condition is Ck(t = 0) = (M/V)Sk,i, represent- 
ing an initial state comprised only of free monomers. For 
simplicity we set the volume V = 1. The above equations 
can be numerically integrated to find the time-dependent 
concentrations Cfc(t) for any set of attachment and de- 
tachment rates. We have previously shown that Eqs. Q] 
provide a poor approximation to the expected number of 
clusters when the total mass M and the maximum cluster 
size N are comparable in magnitude^. 

Although mass action equations provide approxima- 
tions to mean concentrations, they do not directly de- 
scribe any statistical property of the modeled system. 
Nonetheless, one may be able to heuristically derive esti- 
mates of quantities such as mean first assembly times. 
To estimate the mean time to completion of the first 
maximum cluster, we must consider a truncated set of 
mass-action equations which treats maximum clusters as 
"absorbing states" so that once maximum clusters are 
formed, the process is stopped and the time recorded. 
Thus, we set qw = in Eqs. Q] so that once clusters of 
size N are formed, no detachment is allowed. This choice 
ensures that completed assembly events will not influence 
the dynamics of any of the remaining smaller clusters. 

To estimate the mean first assembly time we may in- 
voke the statistical concept of survival probabilities, and 
heuristically combine it with the deterministic solutions 
of Eq.[TJ Following standard notation, we denote by S(t) 
the probability that the system has not yet formed a 
maximal cluster. This quantity is also known as the "sur- 
vival" probability. Its dynamics can be expressed using 
the probability flux J/v out of the last not fully formed 
maximal cluster state, or equivalently into the maximal 
one, conditioned on the system still surviving so that 



— - — = — JnW surviving up to time t). (2) 
at 

The flux Jtv(^ I surviving up to time t) conditioned on 
survival up to time t is not readily found, but a 
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This relationship assumes that the system is always is a 
surviving state (not yet formed a maximum-sized clus- 
ter). However, Eq. [6] also yields unphysical results at 
long times. A deterministic approximation that yields 
physically reasonable results can be obtained by finding 
the time at which the concentration of clusters of size N 
reaches unity 
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Nl 



(7) 



FIG. 2: Mean first assembly times evaluated via the heuristic 
definition Eq.0 (dashed line) and as a function of the detach- 
ment rate qt — q, for M = 7, N = 3 in panel (a) and for 
M = 9, N = 4 in panel (b). Here pi = p = 1. We also 
show the exact results (solid line) obtained via the stochastic 
formulation in Eq.[T2]which we derive in Section III. Qualita- 
tive and quantitative differences between the two approaches 
arise, which become even more evident for N > 3, q — > 0, as 
we shall later discuss. These discrepancies underline the need 
for a stochastic approach. 



mean-field approximation can be applied by assuming 
Jjv(i | surviving up to time t) « Jpj(t)S(t), where Jjv(^) 
is the unconstrained mean particle flux. Thus, the mean 
field approximation for the evolution of the survival prob- 
ability becomes 



dS(t) 
dt 



-J N (t)S(t). 



(3) 



To proceed, we may use deterministic results for Jat(£) 

J N (t) ~pAT-iCi(t)civ-i(i), (4) 
so that the survival probability can be estimated as 



and imposing — in Eqs.[TJ As an example, we con- 
sider the case M = 7, N = 3 for Pi = p = 1, q^s = q (and 
<?3 = as illustrated above), find cn{Tn) from Eqs.[TJ and 
plot the mean first assembly time obtained via Eq.[7] in 
Fig. HJa). For completeness we also show the exact re- 
sults obtained via the full stochastic treatment in Eq. [T2l 
the derivation of which we will focus on below. What 
clearly arises from Fig. [Ha) is that while the mean first 
assembly times obtained stochastically and via the mean- 
field equations are of the same order of magnitude, they 
are also quite different and show even qualitative discrep- 
ancies. For example, the stochastic mean first assembly 
time is non-monotonic in q, while the simple mean-field 
estimate is an increasing function of q. Discrepancies be- 
tween the heuristic and exact stochastic results exist also 
for the case M = 9, N = 4 shown in Fig.[5Jb). Here, 
most notably we can point out that for q = 0, while the 
exact mean first assembly time calculated according to 
our stochastic formulation diverges, it remains finite in 
the heuristic derivation. We shall later see that this trend 
will persist for all choices N > 3 and that the heuristic 
approach does not yield accurate estimates. A stochas- 
tic treatment is thus necessary and is the subject of the 
remainder of this paper. 



S(t) = cxp 



PN-l / Cx(t')cN-l(t')dt' 



-Cjv(t) 



(5) 

Note that while Eq.[5]satisfies S(t = 0) = I, S(t ->■ oo) 
0, due to cat(£ — > oo) being finite. As a consequence, the 
derived first assembly time will always be infinitely large, 
since the system has a finite survival probability even for 
t — >• oo, making the approximation invalid. Alternatively, 
we may approximate Eq.[2]as 



dS{t) 
dt 



'N- 



(6) 



III. BACKWARD KOLMOGOROV EQUATION 

To formally derive first assembly times for our nu- 
clcation and growth process it is necessary to de- 
velop a discrete, stochastic treatment. We thus define 
P(ni, ri2, . . . , 7ijv; t\mi, 777,2, ■ • ■ , ^n] 0) as the probability 
that the system contains n\ monomers, 772 dimers, 773 
trimers, etc, at time t, given that the system started from 
a given initial configuration (mi, m%, . . . mjy) at t = 0. In 
this representation, the Forward Master equation corre- 
sponding to self-assembly with exponentially-distributed 
monomer binding and unbinding events is given by2^ 
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P({n}; t\{m}, 0) = -A({n})P({n}; t\{m}, 0) + 2)(m + l)W+W+W 2 -P({n}; t|{m}, 0) 

AT-l 

+q 2 (n 2 + l)WfWrWrP{{n}-,t\{m},0) + ^ p 4 (ni + l)(n< + l)W 1 + W7 f W i + ^({n}; i|{m}, 0) 

TV 

+ Qiim + VWrWr_ 1 W+P{{n}; t\{m}, 0), (8) 

i=3 



where P({n}, £) = if any n, < and 

JV-l iV 
i=2 i=2 

is the total rate out of configuration {n}. Here, Wj 
are the unit raising/lowering operators on the number of 
clusters of size i so that 

W^W^Wr +1 P({ n };t\{m};0) = 

P(ni + 1, . . . ,rii + l,n i+ i - 1, . . . ; £|{ttt.}; 0). 



I 

The Master equation can be written in the form P = AP, 
where P is the vector of the probabilities of all possible 
configurations and A is the matrix of transition rates 
between them. The natural way of computing the dis- 
tribution of first assembly times is to consider the Back- 
ward Kolmogorov equation (BKE) describing the evolu- 
tion of P(ni, n 2 , • • ■ , tin] t\mi, m 2 , . . . , rajy; 0) as a func- 
tion of local changes from the initial configuration {to}. 
The BKE can be expressed as 



P({n}; t\{m}, 0) = -A({m})P({n}; t\{m}; 0) + |mi(mi - lJW+WfW^ P({n}; t\{m}; 0) 

JV-l 



q 2 m 2 W^W^W^P{{n}; t\{m}; 0) + J2 PimmiiW^Wr Wf +1 P({n}] t\{m}; 0) 

N 

Y, q l m l W+W+_ l W~ P{{n}; t|{m}; 0), 



(9) 



i=3 



r 



where the operators Wf~ act on the initial configuration 
index m,. In the vector representation, the BKE is P = 
A^P, where A^ is the adjoint of the transition matrix 
A as can be verified by comparing Eqs.[5] and |H1 The 
utility of using the BKE is that Eq.|9] can be used to 
determine the evolution of the survival probability, that 
can be naturally defined as 



S({m};t)= Y P({n};t\{m};0), 



(10) 



where we have made explicit the dependence on the ini- 
tial configuration {m}. In Eq.[TU]the sum is restricted to 
configurations where njy = so as to include only "sur- 
viving" states that have not yet reached any of the ones 
where njv > 1. S({ni};t) thus describes the probability 
that no maximum cluster has yet been formed up to time 
t, given that the system started in the configuration {to} 



at t = 0. One can now similarly sum Eq. [S]over all final 
states with fixed njv = to find that S({m}; t) also obeys 
Eq.[S]with P({n}; t\{m}, 0) replaced by S({m};t), along 
with the definition S(mx, TO2, • • • , TOjv > 1; i) = and 
the initial condition S(mi, m,2, . . . , TOjv = 0; 0) = 1. In 
the vector representation where each element of S cor- 
responds to a particular initial configuration, the gen- 
eral evolution equation for the survival probability is 
S = A^S, where we consider only the subspace of A* 
on nonabsorbing states. Solving the matrix equation for 
S leads to a vector of first assembly time distributions 



G- dS 



(11) 



where each element of G represents the first assembly 
time distribution starting from a different initial cluster 
configuration. Appendix lAl explicitly details the calcula- 
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FIG. 3: Allowed transitions in stochastic self-assembly start- 
ing from an all-monomer initial condition. In this simple ex- 
ample, the maximum cluster size N = 3. (a) Allowed tran- 
sitions for a system with M = 7. Since we are interested in 
the first maximum cluster assembly time, states with n?, = 1 
constitute absorbing states. The process is stopped once the 
system crosses the vertical red line, (b) Allowable transitions 
when M — 8. Note that if monomer detachment is prohibited 
(q — 0), the configuration (0, 4, 0) (yellow) is a trapped state. 
Since a finite number of trajectories will arrive at this trapped 
state and never reach a state where 713 = 1, the mean first 
assembly time Ts(8, 0, 0) — > 00 when q = 0. 



where we have assumed N — 3, M = 7, and pi = p, 
qi = q are constants. The label (7,0,0) indicates an ini- 
tial condition consisting of M = 7 monomers, no dimers, 
and no trimers. Corresponding expressions for the mean 
first assembly time arise for different initial conditions, 
such as e.g., (5,1,0), (3,2,0), or (1,3,0). These exact 
expressions for the mean first assembly times are non- 
monotonic in both q and p, indicating that there are op- 
timal q/p ratios for which the first assembly times are 
smallest. We will discuss the monotonicity of Tjv({m}) 
below, both in the limit of fast and slow detachment. 
For simplicity, we will retain the assumption of uniform 
Pi = p and qi = q throughout the remainder of this work 
and henceforth rescale time in units of p -1 . With this 
choice, 5 > 1 represents fast detachment, while q <C 1 
represents slow detachment. T3(7, 0, 0) has already been 
plotted in Fig.[2ja), contrasting it against the heuristic 
approximation of Eq.[7l A similar matrix approach can 
be used for the case M = 9, N = 4 yielding a cumber- 
some but exact expression for T4(9, 0, 0, 0) that is plotted 
in Fig.GHb). 

IV. RESULTS AND ANALYSIS 

In this section we study the properties of the first as- 
sembly time in the irreversible detachment limit, when 
q = 0, and in the limits of slow (0 < g « 1) and fast 
detachment (q 3> 1). 



tion procedures required to compute S, G, and the mo- 
ments of the first assembly times. For example, using 
Eq. El we find 



23(7,0,0) 



1 744p 3 + 487p 2 q + 6Qpq 2 



I05p 2 



27p 2 + 20pq + 2q 2 



(12) 



A. Irreversible limit (q = 0) 

First consider N = 3 and irreversible self-assembly where 
q = 0. In this case, the matrix A' is bidiagonal and the 
analysis outlined in Appendix |B] yields the exact expres- 
sion for any starting configuration: 



T 3 (M -2n,n,0) 



(M -2n)(M - 1) 



[M/2] j 

e n 

j=l k=n+l 



r 



(M-2fc + 2)(M-2fc + l) 
(M - 2k)(M - 1) 



(13) 



Note that when q = the mean first assembly time is 
finite when M is odd, but is infinite if M is even. This 
can be understood from the example M = 8, N = 3 illus- 
trated in Fig.[3^b), where a "trapped" state arises. In this 
case, there is a finite probability that the system arrives 
in the state (0, 4, 0) trapping it there forever since the as- 
sembly process is irreversible and detachment would be 
the only way out. Therefore, averaging over trajectories 
that include these "traps" , the mean assembly time will 
be infinite. For q = 0, we can show that a trapped state 
exists for any M and N > 4, yielding infinite assembly 



times. A trapped state arises when all free monomers 
have been depleted (ni = 0) before a maximum cluster 
has been able to assemble (njy = 0). In this case, the 
total mass must be distributed according to 



M 



N-l 
j=2 



(14) 



It is not necessarily the case that this decomposition is 
possible for all M and N, but if it is, then we have a 
trapped state and the first assembly time is infinite. To 
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show that the decomposition holds for N > 4 and for all 
M, we write M = a(N — 1) +j where a is the integer part 
[M/(N - 1)], so that 1 < j < N - 2. Now, if j ^ 1, then 
the decomposition is achieved with tiat_i = a, rij = 1, 
and all other nk = for k ^ j, (N — 1). We have thus 
constructed a possible trapped state. If instead j = 1, 
then we can rewrite M = (a - 1)(N - 1) + (N - 2) + 2 so 
that the decomposed state is defined by nj\r_i = a — 1, 
tin -2 = 1 and 7i2 = 1, with all other values of = 0. 
This proves that for all M, N there are trapped states 
for q = 0. The only exception is when N = 3, where the 
last decomposition does not hold, since TV — 2 = 1 and by 
definition, monomers are not allowed in trapped states. 
Indeed, for N = 3, Eq.QT] gives AI = 2n 2 as the only 
trapped state, which is possible only for M even. The 
case M = 7 and N = 3 is shown in Fig.[3ta). 

According to our stochastic treatment, the possibility 
of trajectories reaching trapped states for q = exists 
for any value of M, N > 4, giving rise to infinite mean 
first assembly times. This behavior is not mirrored in 
the mean-field approach for q = 0, where cat (TV) = 1 for 
finite TV depending on initial conditions, if M is large 
enough as can be seen in Fig.[2Jb). For TV = 4, M = 
9, indeed 14(9, 0,0, 0) can be evaluated from Eqs.Q] as 
C4(1.7527) = 1. In the irreversible binding limit, we may 
thus find instances where the exact stochastic treatment 
yields infinite first assembly times due to the presence of 
traps, while in the mean-field, mass action case, the mean 
first assembly time is finite. This leads us to expect that 
mean-field approximation to the first assembly time will 
be inaccurate when q > 0, but small. Here, the trapped 
states (when q = 0) retain the system for a long time. 

Since infinite mean first assembly times are a conse- 
quence of the existence of trapped states one may ask 
what is the mean first assembly time conditioned on traps 
not being visited. To this end, we explicitly enumerate 
all paths towards the absorbed states and average the 
mean first assembly times only over those that avoid such 



trap 



,29,30 



To be more concrete, we first consider the case 



N = 3. Here, in order to reach the absorbing state where 
713 = 1, one or more dimers must have formed. Let us 
thus consider the specific case 1 < n 2 < [ M 2 ~ 1 ] • Here, 
the second bound arises because after n 2 dimers have 
formed, at least one free monomer must remain in order 
to attach to one of the 712 dimers to form the first trimer. 
Since at every iteration both the formation of a dimcr 
and of a trimer can occur, the probability of a path that 
leads to a configuration of exactly n 2 dimers is given by 



2n 2 (M-2n 2 ) 



(M - 2n 2 )(M - 2n 2 - 1) + 2(M - 2n 2 )n 2 ' 



(16) 



Upon simplifying the product of the two probabilities in 
Eqs.[15] and [TBI wc find that the probability W n2 for a 
path where n 2 dimers are created before the final trimer 
is assembled is given by 



2n 2 



(M - 1) 



"2 + 1 



J (M -2k- 1). 



k=0 



Note that if M is even, we must discard paths where 
2n 2 = M, since, as described above, this case represents 
a trap with no monomers to allow for the creation of a 
trimer. According to Eq.[l5j the realization 2n 2 = AI 
occurs with probability 



Wa 



(M - 3)!! 



(M-lY 



M 



(17) 



Thus for M even, Wm_ represents the probability the 
system will end in a trap. We must now evaluate the 
time the system spends on each of the trap-free paths. 
Note that the exit time from a given dimer configuration 
(AI — 2k, fc, 0) is a random variable taken from an ex- 
ponential distribution with rate parameter given by the 
dimerization rate, \ d . k = (AI - 2k)(M -2k- l)/2. How- 
ever, the formation of a trimer is also a possible way out 
of the dimer configuration, with rate X ti k = (AI — 2k)k. 
The time to exit configuration (AI — 2k, k, 0) thus is itself 
an exponentially distributed random variable with rate 
Afe given by the sum of the two rates^i 



Afc = \d,k + A t k 



(M - 2k)(M - 1) 



The typical time out of configuration (AI—2k, k, 0) is thus 
given by 1/Xk- Upon summing over all possible values 
< k < n 2 , we find the typical time for the system to go 
through n 2 dimerizations 



"2 -. "2 



Finally, we can write the mean first assembly time as 



712 — 1 

n 

k=0 



(M -2k)(M -2k- 1) 



(M - 2k)(M -2k -I) + 2(AI - 2k)k' 



(15) 



The above quantity must be multiplied by the probability 
that after n 2 dimerizations, a trimer is formed, which 
occurs with probability 



r 3 (M,o,o)= W ^T n 

n 2 = l 



(18) 



It can be verified that for AI odd, Eq.[18] is the same as 
Eg. 1131 since the integer part that appears in the sum 
in Eq.[TS]is the same as its argument, thus including all 
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paths. For M even, paths with 2rii = M arc discarded, 
yielding a mean first assembly time averaged over trap- 
free configurations. 

Similar calculations can be carried out for larger N; 
however, keeping track of all possible configurations be- 
fore any absorbed state can be reached becomes quickly 
intractable. For example, when N — 4 one would need 
to consider paths with a specific sequence of i%2,k dimers 
formed between the creation of k and k + 1 trimcrs until 
713 trimers are formed. The path would be completed by 
the formation of a cluster of size N = 4. We would then 
need to consider all possible choices for 1 < n 3 < 
such that traps are avoided and evaluate the typical time 
spent on each viable path. Because of the many branch- 
ing possibilities, it is clear that the enumeration becomes 
more and more complicated as N increases. 



trap, starting from the (M, 0, 0) initial configuration for 
q = 0. This quantity can be evaluated by considering 
the different weights of each path leading to the trapped 
state. An explicit recursion formula has been derived in 
our previous workS^ in Section 4 and in Eq. A. 12. In the 
N = 3 case however, the paths are simple, since only 
dimers or trimers are formed, leading to 



M 

P*^,—,0) = 
{ 2 ' (M 



(M - 3)!! 



M 



(19) 



which is the same as what was derived in Eq.[17] The 
first assembly time T(0, M/2, 0) starting from state 
(0,M/2,0) is 



B. Slow detachment limit (0 < g < 1) 

Although mean assembly times are infinite in an irre- 
versible process (except when M is odd and N = 3), 
they are finite when q > 0. For general values of M and 
N and for small q > 0, we can find the leading behavior of 
the mean first assembly time T/v(M, 0, • • ■ ,0) perturba- 
tively by considering the trajectories from nearly trapped 
states into an absorbing state with at least one completed 
cluster. 

Since the mean arrival time to an absorbing state is 
the sum of the probabilities of each pathway, weighted 
by the time taken along each of them, we expect that 
the dominant contribution to the mean assembly time in 
the small q limit can be approximated by the shortest 
mean time to transition from a trapped state to an ab- 
sorbing state. This assumption is based on the fact that 
the largest contribution to the mean assembly time will 
arise from the waiting time to exit a trap, of the order 
of ~ l/q, since detachment is the only possible path out 
of the otherwise trapped state. The time to exit any 
other state will be of order 1 since monomer attachment 
is allowed. For sufficiently small detachment rates q, we 
thus expect that the dominant contribution to the mean 
assembly time comes from the trajectories that sample 
nearly trapped states and that T/v(M, 0, . . . , 0) ~ l/q. 

Again, first consider the tractable case N = 3 and 
M even, where it is clear that the sole trapped state is 
(0, M/2, 0) and the "nearest" absorbing state is (1, M/2- 
2, 1). Since the largest contribution to the first assembly 
time occurs along the path out of the trap and into the 
absorbed state, we posit 



M M 
T 3 (M, 0, 0) ~ P*(0, — , 0) r 3 (0, —,0), 

where P*(0, M/2, 0) is the probability of populating the 



2 Tl 



1,0). 



(20) 



Here, the first term is the total exit time from the trap, 
given by the inverse of the detachment rate q multiplied 
by the number of dimers. The second term is the first 
assembly time of the nearest and sole state accessible 
to the trap. This quantity can be evaluated, to leading 
order in l/q, as 



r 3 (2,y-i,o) 



2(f -1) 



-T 3 (0,^,0), (21) 



where we consider that the trap will be revisited upon ex- 
iting the state (2, M/2 - 1, 0) with probability l/(2(4f - 
1) + 1). In principle, Eq.[2U should also contain an- 
other term representing the possibility of reaching state 
(4, M /2 - 2, 0) via detachment from state (2, M/2 - 1, 0) 
and its contribution to the first assembly time. However, 
the magnitude of this term would be much smaller than 
l/q, since detachment rates are of order 0(q) <C 0(1/ q). 
Another term that should be included in Eq.[2T] is the 
possibility of reaching the absorbing state (1, M/2 — 2, 1). 
This term however, yields a zero contribution to the first 
assembly time. Upon combining Eqs.[20] and [21] we find 
that as q — > 



T (o M m- 2(M ~ 1} 1 

T3( °'T' 0) -M(M-2)g- 



Finally, T^M, 0,0) can be derived by multiplying the 
above result by Eq.[T!J] We can generalize this procedure 
to find the dominant term for the mean assembly time 
starting from any initial state (M — 2n, n, 0) in the limit 
of small q, N = 3 and for M even 
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T 3 (M,0,0)~T 3 (M- 2,1,0) 
T 3 (M -2n,n,0) 
T 3 (0,M/2,0) 



2(M-3)!! 1 
M(M - 2)(M - l) A '/2-2g' 
2(M-2n-l)!! 1 

M(il/-2)(A/- l)^/2-n-l ~ 

2(M - 1) 1 
M{M-2)q' 



(22) 

2 < n < M/2 (23) 

(24) 



The next correction terms do not have an obvious closed- 
form expression, but are independent of q. Note that 
when q is small and increasing, the mean first assembly 
times decrease. This is also true for odd M. A larger q 
leads to a more rapid dissociation, which may lead one 
to expect a longer assembly time. However, due to the 
multiple pathways to cluster completion in our problem, 
increasing q actually allows for more mixing among them, 
so that at times, upon detachment, one can "return" to 
more favorable paths, where the first assembly time is 
actually shorter. This effect is clearly understood by con- 
sidering the case of q = when, due to the presence of 
traps, the first assembly time is infinite. We have al- 
ready shown that upon raising the detachment rate q to 
a non-zero value, the first assembly time becomes finite. 
Here, detachment allows for visiting paths that lead to 
absorbed states, which would otherwise not be accessi- 
ble. This same phenomenon persists for small enough q 
and for all M, N values. The expectation of assembly 
times increasing with q is confirmed for large q values, as 
we shall see in the next section. Taken together, these 
trends indicate the presence of a minimum in the mean 
first assembly time that occurs at an intermediate value 
of the detachment rate q. 

We can generalize our estimate of the leading 1/q term 
for the first assembly time to larger values of N via 

T N (M, 0, . . . , 0) = P*m)T N ({fi}), (25) 
M 

where {fi} are trapped state configurations for q = 0. 
The values of P*({fj,}) can be calculated as described 
above using the recursion formula presented in2^. Ap- 
proximate mean first assembly times T/v({/x}) from traps 
{fj,} may be found by considering equations for the short- 
est sub-paths that link traps to each other. For in- 
stance, in the case of M — 9, N = 4 the only trapped 
states are (0, 3, 1, 0) and (0, 0, 3, 0), with associated prob- 
abilities P*(0, 0,3,0) = 921/5488 and P*(0,3, 1,0) = 



2873/24696, respectively The shortest path linking the 
two traps is (0,3,1,0) -> (2,2,1,0) -> (1,1,2,0) 
(0,0,3,0), which yields, to first order, T(0, 1,3,0) = 
T(0, 0,3,0) = 1/(2(7). Finally, from Eq.^we find that 
T(9, 0,0,0) = 2005/(14112?) which can be verified by 
constructing the corresponding D(9,4) = 12 dimensional 
transition matrix and solving the linear eigenvalue 
problem. Enumerating trajectories that intersect nearly 
trapped states becomes increasingly complex as M and 
N increase since more traps arise, leading to the identi- 
fication of more entangled sub-paths connecting them. 



C. Fast detachment limit (q — > oo) 

We now consider the case where detachment is much 
faster than attachment and q 3> M. In this limit, we 
expect the full assembly of a cluster to be a rare event in 
the large q limit, and that the mean assembly time will 
increase monotonically with q. 

Dominant path approximation - Our first approximation 
is based on the observation that for q — > oo the domi- 
nant configurations are those with the most monomers 
(the higher states in each column of Fig. [3]). Thus, 
the dominant trajectories will be the ones that most di- 
rectly arrive at the absorbing state with one full clus- 
ter. For N = 3, the overwhelmingly dominant paths are: 
(A/, 0, 0) ^ (M - 2, 1, 0) ^ (M - 3, 0, 1). The dynam- 
ics of the probabilities of the two "surviving" states with 
n 3 = can be represented by a linear 2x2 system that 
is easily solved to yield, in the q — > oo limit, 



T 3 (M,0,0) ~ T 3 (M -2,1,0) ~ — ; 4- r , 

iy ' ' ' iy ' ' ' M(M - 1)(M — 2) 

The dominant path method can be generalized to any 
M > N for q > M as follows 



I 

(M,0,0, ...,0) ^ (M-2,l,0...,0) ^ •■• ^ (M - r, 0..., 1, .., 0) ^ •■• ^ (M - N, 0, ...0, 1). (26) 

I 

Here, the corresponding transition matrix is tridi- agonal and of dimension (N — 1) with elements r\ 1 = 
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~r\,2 = -M(M-l)/2 and r^_ x = q,r{ k = -q-(M- 
k),r f k k+1 = (M - k) for 2 < k < (N - 1). The in- 
verse of can be computed by a three-term recurrence 



formula^. After some algebraic manipulation, we can 
write the first assembly time along the path in Eq.[26] for 

any M > N and for q > M as 



Tn(M, 0, . . . , 0) 



2V-2 



N-2 k 



U(M-(N-£))t 



-k 



k=0 1=1 



M(M - 1) 



N-j-1 k 



3=2 1=2 



j-k 



k=0 1=1 



r 



(27) 



Our notation is such that products with the lower index 
larger than the upper one are set to unity. In Eq.[271 the 
largest term in the q — > oo limit is given by 



Tn(M, 0, . . . , 0) 



2q 



N-2 



i) 



The additional assumption M 3> N on the other hand, 
leads to the approximation M — i ~ M so that Eq.[27l 
becomes 



Hybrid approximation - We now consider a differ- 
ent approach to the fast detachment q — > oo limit 
by using a "pre-equilibrium" or "quasi steady-state" 
approximation^ that partially neglects correlations be- 
tween some of the cluster numbers by separating time 
scales between fast and slow varying quantities. We will 
use the pre-equilibrium approximation on the stochastic 
formulation of Eq.0 however, to illustrate the method, 
we will first apply it to Eqs.Q] with q^ = 

The equilibrium values c^ q arising from Eqs. [1] where 
Pi = 1 are given by 



2q l 



T N (M, 0,...,0) 



JV-l 



M N 



N-l 

E 

L k=2 



(k - l)M k 



2 N y 2 M>> 

^ k=0 



n e 1 — 



(28) 

Results for other choices of initial configurations {to} can 
be obtained by following the same reasoning illustrated 
here. We expect Tjv({to}) not to be too different from 
Tn(M, 0, . . . , 0) in the strong detachment q —> oo case 
when any initial clusters will rapidly disassemble, leading 
the system towards the free monomer configuration. The 
distribution of first assembly times can also be obtained 
within the dominant path approximation, as outlined in 
Appendix [C] 

We expect these results to hold for large q > M, small 
values of N and moderate values of M so that the most 
likely trajectories follow the dominant path. However, 
due the possibility of many branching paths in configura- 
tion space, modest changes in {M, N, q} may allow sam- 
pling of secondary paths that yield different estimates 
of the first assembly time. Indeed, as both M and N 
become larger, the creation of several intermediate clus- 
ters may be more favorable than progressively adding 
monomers to the largest one. In the next subsection we 
thus introduce a "hybrid" approach, where the possibility 
of having multiple intermediate aggregates is included by 
assuming that the first r clusters are distributed accord- 
ing to the Beckcr-Doring equilibrium distribution and the 
remaining N — r follow a monomer-to-largest cluster path 
towards complete assembly 



for 2 < i < N, whereas c^ q can be obtained using the 
mass conservation constraint 



Wc can now define the fluxes J*(t) 



J+(t) scWc^i) 



jr(t) =qc l+1 (t) 



2q l 
2<f - 2 ' 



for 1 < i < N — 1. Note that as q — > oo all fluxes de- 
crease and that since q is large, J^(t) ^> J^-i{t)- This 
implies that smaller clusters experience faster dynamics 
and motivates the quasi-steady state approximation. We 
may thus consider the first N — 1 reactions to be at equi- 
librium so that Eqs. [T] can be rewritten as a function of 
the mass x(t) contained in all clusters except the largest 
one: 



JV-l 



s(t) = J2 ic *(*)- 



(29) 
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Eqs.[T]now become 



x(t) = -Nc N (t) = -^ r ^c 1 (x) N , (30) 
where c\{x(t)) satisfies 

N-l . 

i=2 H 

Upon solving Eq. l31[ we can obtain c\ (x) , which can then 
be inserted in Eq.[3D]to determine c/v(£) and x(t). Upon 
using the crude approximation c±(x) ~ x in Eq. 1301 we 
can solve for x(t) and find the time at which cat = 1, or 
equivalently x = M — Ncn = AI — N 



In particular, we require the "fast" subsystem to be er- 
godic and to possess a unique equilibrium distribution. 
The dynamics of the "slow" subsystem is then obtained 
by averaging the fast variables over their equilibrium dis- 
tribution; the basic assumption is that while slow vari- 
ables evolve, the fast ones equilibrate instantaneously to 
their average values^. As we shall see, due to the equilib- 
rium hypothesis, summing Eq.[5] over the variables that 
constitute the fast subsystem, will lead to the vanish- 
ing of all terms that do not modify the slow variable, 
and all remaining terms will involve averages of the fast 
variable^. 

Just as in the deterministic case, we allow the first 
N—l cluster sizes to equilibrate amongst each other and 
write the probability distribution function using a mean- 
field approach 



T N (M, 0,...,0) 



,AT-2 



N 



N - 1 



N - 1 



A more accurate result can be found by allowing the sum 
in Eq.[3T]to go to infinity so that 



P({n}; t\{m}, 0) = P cq ({n'}\n N )P(n N ; t\{m}, 0). (32) 

For fixed n^r, P cq ({n'} \n n) represents the equilib- 
rium distribution function for the first, fast {n'} = 
{rii, . . . , njv-i} cluster sizes and 



c\{x) 



q + \J Aqx + q 2 



Using this expression in Eq. [3(3 the mean mass in the 
fast clusters x(t) can be explicitly computed. Note that 
we could as well have assumed that only the first 1 < 
r < N — 1 clusters equilibrate among themselves, and 
solved a reduced system of N — r + 1 equations. We do 
not pursue this explicitly here, and move directly to the 
stochastic version of pre-equilibration. 

A separation of time scales can also be performed in 
stochastic systems, where the basic assumptions for pre- 
equilibration are the same as for the deterministic case. 



P(n N ;t\{m},0) = ^ P({n'},n N ;t\{m},0) (33) 



{„'} 



is the probability distribution for the last, slow cluster 
size un- The sum in Eq.[33]is to be performed over all 
values of {n 1 } such that mass conservation 1 itii = 
M — Nun is obeyed. Note that while P eq ({n'}\nN) does 
not depend on the initial conditions of the {n'} clus- 
ters, it does depend on njv- Upon inserting the ansatz 
in Eq.[32]in Eq.[5]and performing the summation over all 
configurations {n'} with fixed njy, we find 



P(n N ;t\{m},0) = - ((ninjv-i |?ijv)cq + qn N )P(n N ; t\{m}, 0) 

+ (nin A r„ 1 |n A r - l) cq P(n N - l;t\{m},0) + q(n N + l)P(n N + l;t\{m},0). 



(34) 



r 



In Eg. [34l we have used the notation (ninjv_i|niv)eq an d a death rate qn^q. Starting at 

rijv = at time t = 0, the first birth event coincides with 

l the first assembly time so that the survival probability 

(nin JV -i|n JV ) eq = nm JV _ 1 P eq ({n }\n N ) (35) can be written ag 

{«'} 



representing the equilibrium second moment 
(ninjv-i|fijv) oq , which is an average over all fast 
variables with the added constraint that they have 
total mass M — Nun. Eq.|M] implies that njy follows 



S N -i(t) = exp I (nin N -i\n N = 0) eq dt 



(36) 



a Markovian birth and death process with birth rate where the "N— 1" indicates that all clusters of size N— 1 
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and smaller have been pre-equilibratcd. Having defined 
A = (nirijv-il'T'jv = 0} cq in Eg. 1361 the first assembly time 
distribution is exponential 

G N - 1 ({m};t) = XeT Xt , (37) 

and the mean first assembly time is given by 
Tn(M,--- , 0) = 1/A. The remaining difficulty lays in 
determining the quantity (nin,N-i\nN)cq- We may re- 
sort to a very crude approximation, by simply using the 
Becker-Doring results 

(nirajv_i|njv) eq ~ q q c^_ 1 (38) 



Eq.[38]can now be used to estimate A and all other related 
quantities. Our work so far implies that the first assem- 
bly time is exponentially distributed according to Eq. 1371 
However, upon comparing with results from Monte- Carlo 
simulations in the next section, we will show that the 
N — 1 pre-equilibration and is often not a good approx- 
imation. As outlined in Appendix [Dl a less drastic ap- 
proximation can be implemented by allowing only the 
first r species (1 < r < N) to pre-equilibrate. This more 
restricted pre-equilibration approximation can occasion- 
ally provide better fits to simulation as we will see in the 
next section. 



V. COMPARISON WITH SIMULATIONS 

In this section we present results derived from simula- 
tions of the stochastic process associated to the proba- 
bility distribution process for various values of {M, N, q}. 
Specifically, we use an exact stochastic simulation algo- 
rithm (kinetic Monte-Carlo, KMC) to calculate first as- 
sembly time o 40 ' 41 . For each set of {M, N, q} we sample 
at least 10 4 trajectories and follow the time evolution of 
the cluster populations until tin = 1, when the simu- 
lation is stopped and the first assembly time recorded. 
We compare and contrast our numerical results with the 
analytical approximations evaluated in the previous sec- 
tions. 

We begin with the simple case of M = 7 and N = 3 
in Fig.[4ta) where we plot the mean first assembly time 
X3(7, 0, 0) as a function of q obtained via our exact results 
Eq.HU and by runs of 10 5 KMC trajectories. Numerical 
and exact analytical results are in very good agreement, 
in contrast to the discrepancies between the fully stochas- 
tic and mean field treatments observed in Fig.[3J For 
comparison, we also plot in Fig.^b) the mean first as- 
sembly time T 3 (8, 0, 0) for M = 8 and N = 3, where the 
presence of the trapped state (0, 4, 0) leads to a diverg- 
ing first assembly time for q = and to the asymptotic 
behavior T 3 (8,0, 0) ~ 1/q for q — > 0, as predicted. Note 
that as discussed above Tz(7, 0, 0) is finite for q = due 




q 



FIG. 4: Mean first assembly times for M = 7 and N = 3 in 
panel (a) and M — 8 and TV = 3 in panel (b). Exact results 
derived in Eq.[l2] are plotted as black solid lines, while red 
circles are obtained by averaging over 10 s KMC trajectories. 
The dashed blue line shows the q — > approximation in Eg . 1221 
and the q — > 00 approximation in Eg . 1271 

to the lack of trapped states for N = 3 and M odd. We 
do not plot the first assembly time distributions as their 
features are similar to ones we will later discuss. 

We generalize this analysis by plotting numerical esti- 
mates of Tio(M, 0, ■ ■ • , 0) as a function of q for various 
values of M in Fig.^a). As expected, for small g, the 
mean first assembly time scales as 1/q for all values of 
M. Similarly, for all values of M, the first assembly time 
presents a minimum, due to the previously-described in- 
creased weighting of faster pathways upon increasing q 
for small enough values of q. For larger values of q 
we expect the most relevant pathways towards assem- 
bly to be the ones constructed along the linear chain de- 
scribed in ([25)1 . Indeed, we find that in accordance with 
Eq$MT N (M,0, ■ ■ • ,0) ~ 2q N ~ 2 /M N as q > M. Small 
and large q estimates using the dominant path approxi- 
mation are shown in Fig.[5ja). 

As discussed earlier, the dominant path approxima- 
tion becomes less accurate as M increases, since the lin- 
ear chain pathway neglects other possible routes towards 
complete assembly, that become relevant as M increases. 
In Fig.(5jb) thus we plot the same data points, using the 
hybrid approximation discussed above for large q, with 
r = 2. Note a much closer fit with the simulation data, 
especially as M increases. 

In Fig.[6fa) we plot Tjv(M, 0, • • ■ , 0) as a function 
of M for q fixed and various N, while in Fig.[5fb) 
T N (M,0, . . . ,0) is plotted as a function of M for N 
fixed and various q. Both Figs.^a) and^b) show that 
the results derived in Eq.[27] for large q using the dom- 
inant path approximation are accurate provided M is 
not too large compared to N, As shown by the black 
solid lines, in this case T N (M,0,--- ,0) ~ 2q N - 2 /M N . 
For larger values of M, the dominant path approxima- 
tion becomes inaccurate: numerical results indicate that 
T N (M, 0, • • • , 0) ~ 1/M V with v - 1 as q -> 00. In this 
regime, the hybrid approximation with r = 3 yields a 
better fit, as shown by the solid lines in Figs.EJa) and 

ED>). 

Finally, in Figs.JTHHlwe plot the distribution function 
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FIG. 5: Comparison of theory with simulations for N = 10, 
and several values of M. Symbols are derived from 10 4 KMC 
simulations for M = 50, 200, 1000. In panel (a) the dashed 
lines are obtained by plotting the curve Tio(M, 0, . . . , 0) = 
A/q where A is given by imposing passage through the first 
point to the left in the graph. Note that all other points align 
to the same curve. Solid lines are derived from Eq.[27]in the 
dominant path approximation. In panel (b) results from the 
hybrid approximation with r — 2 in Eq, ID2l are superimposed 
on the same data. Note the much better fit in the hybrid 
approximation as q — ► 00, especially as M becomes larger. 
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FIG. 6: First assembly times Tm(M, 0, . . . , 0) as a function of 
M for q — 100 and several values of N in panel (a), and for 
N = 5 and several values of q in panel (b). The black dashed 
lines represent the dominant path approximation for large 
q in Eg . 1271 while the solid black line represents the hybrid 
approximation in Eg, ID2l for r — 2. We chose to plot only 
representative cases, not to clutter the graphics, but similar 
trends persist in panel (a) for iV = 4, 6, 8 and in panel (b) 
for q = 10, 100. Note that the dominant path approximation 
ceases to be accurate for very large values of M and that the 
hybrid approximation provides a better fit as q — > 00. 



G({M, 0, ••• , 0},t) of the first assembly times for sev- 
eral representative choices of {M,N,q}. As illustrated 
in the figure captions, analytical estimates were calcu- 
lated either by inverse Laplace transforming Eq. lCll after 
having numerically found its poles, or via the hybrid ap- 
proximation in Eq. lD2l with specific values of r. From 
Fig. [7] note upon increasing q, G({M, 0, . . . , 0}, t) grad- 
ually shifts from having a log-normal shape towards an 
exponential distribution characterized by the decay rate 
evaluated in Eq. lC3[ Some combinations of M and N, 
such as M = 200 and N = 8 in Fig. [5] yield a bimodal 
distribution for small q. This can be explained by not- 
ing that while fast routes towards nucleation may ex- 
ist, other pathways lead the system to the previously 
described trapped states where ri\ = njy = 0. Exit from 
these traps is unlikely for small q, yielding larger first as- 
sembly times. The emergence of a bimodal distribution 
should be more apparent for larger values of N when 
there is a longer pathway towards assembly and more 
potential traps. Indeed, although not shown in Fig. [7] for 
M = 50 and N = 4, a few trajectories populate the re- 



gion t ~ 1/q, indicating passage through at least one of 
the nine possible trapped states. However, the weights of 
these possible paths are very small (only about 10 or so 
out of 10 runs incurred into a trapped state), so we do 
not include them in Fig. [Jj which is truncated at t <C l/q, 
when q — >• 0. This occurs also for M = 8 and TV = 3, 
where a minor spread due to the (0, 4, 0) trap and cen- 
tered around t ~ l/q arises in the distribution tail, and 
which is absent from the trap-free case of M = 7 and 
N = 3. 

Note that although few paths may populate the region 
t ~ l/q their contribution to the mean first assembly 
time may be significant. In Fig. [5] we also include ana- 
lytical estimates of the first assembly times: the dashed 
red curves are derived from the dominant path approxi- 
mation in Eq. lCll and the solid blue ones from the hybrid 
approximation in Eq. lD2l using r — 3. As noted above, 
for very large q, the dominant path approximation fails 
and the hybrid approximation provides a closer fit to our 
numerical results. 

In Fig.|ni we plot the first assembly time distribution 
for fixed q — 100 and N — 8 and varying M. As ex- 
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FIG. 7: Probability distributions for the first assembly time 
for iV = 4 and M = 50 and for various values of q. The 
black bars are obtained as a normalized histogram of 10 4 
KMC simulations. The dashed red and solid blue lines are 
the probability density functions estimated via the dominant 
path approximation in Eq. lCll and via the hybrid approxima- 
tion with r — 3 in Eq. lD2l respectively. The detachment rate 
q increases as indicated in each subplot. Note that initially 
the distribution has a log-normal shape and later turns into 
an exponential. As predicted, the analytical estimate given 
by Eq. lCll becomes accurate for q > M. Also note the change 
in scale and the broadening of the distribution as q increases. 
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FIG. 8: First assembly time distributions for N = 8 and 
M = 200 for various values of q. The black bars are ob- 
tained as a normalized histogram of 10 4 KMC simulations. 
The dashed red and solid blue lines are the probability den- 
sity functions estimated via the dominant path approxima- 
tion in Eq. lCll and via the hybrid approximation with r = 3 
in Eq. lD2l respectively. The detachment rate q increases as 
indicated in each subplot. Note that the distribution evolves 
from a bi-stable curve to, later acquiring a log-normal shape, 
before turning to an exponential. As above, as q — > oo the 
hybrid approximation yields a closer fit to the numerical data. 



pected, for q > M, G({M, 0, ••■ , 0},t) is well approxi- 
mated by the exponential distribution in Eq. lC41 As M 
increases the distribution acquires a log-normal shape. In 
this case, we find the hybrid approximation to fail regard- 
less of r. Indeed, our numerical results show that there 
is no specific criterion to ensure that the hybrid approx- 
imation will yield even qualitatively valid estimates for 
the first assembly distributions as M —> oo. Empirically, 
we find that while mean first assembly times predictions 
are quite accurate within the hybrid approximation, the 
first assembly distribution estimates are more likely to 
be accurate when the are exponentially distributed. 



VI. SUMMARY AND CONCLUSIONS 

We have studied the problem of determining the first as- 
sembly time of a cluster of a pre-detcrmincd size N to 
form from an initial pool of M independent monomers 
characterized by uniform attachment and detachment 
rates p = 1 and q, respectively. We have shown that while 
heuristic approaches using the traditional Becker-Doring 
equations can be developed, these fail to capture rele- 
vant qualitative features, such as divergences and non- 
monotonic behavior. A full stochastic approach, based 
on the Backward Kolmogorov equation, was investigated. 



We developed our stochastic model and were able to 
find exact results for the first assembly time in systems 
where M, N are small enough for analytical treatments 
to be feasible. For general M, N we were able to esti- 
mate general trends and behaviors for both large and 
small q. In particular, we find that in the absence of de- 
tachment, when q = 0, trapped states arise from which 
the system is never able to escape, leading to infinitely 
large first assembly times. Furthermore, we showed that 
these traps arise for all values of N > 3, regardless of M. 
The possibility of a trap, and of diverging first assembly 
times is not captured by the heuristic approach, and is 
confirmed by our KMC simulations. We are also able to 
show that for small q, the divergence in the first assembly 
time scales as l/q. The latter result may appear counter- 
intuitive, since larger detachment rates should intuitively 
hinder the assembly process, leading to the expectation 
that larger q implies larger first assembly times. While 
this is true in the q — > oo limit, in the case of q — » an 
opposite trend arises: the increased accessibility of poten- 
tial paths in configuration space that lead to more rapid 
first assembly times. As q increases, these new paths be- 
come increasingly populated, yielding an overall decrease 
in the first assembly time. Finally, for larger values of q 
we identify the most likely path to be traveled in phase 
space towards the first assembly of an iV-cluster and de- 
rive estimates for the associated first assembly time and 
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FIG. 9: First assembly time distributions for N = 8 and 
q = 100 for various values of M. The black bars are ob- 
tained as a normalized histogram of 10 5 KMC simulations. 
The dashed red and solid blue lines are the probability den- 
sity functions estimated via the dominant path approximation 
in Eq. (|C1|) and via the hybrid approximation with r = 3 in 
Eq. lD2l respectively. Total mass M increases as indicated in 
each subplot. Note that the distribution evolves from an ex- 
ponential with decay rate given by Eq, IC3l valid for q > M, 
towards a more log-normal shape. In this case, for very large 
M both dominant path and hybrid approximation fail. 



probability distribution functions. For g > 1, wc also 
considered a "hybrid" approach where the first few clus- 
ters were allowed to equilibrate, while the larger ones 
were still evolving stochastically. In certain cases we were 
able to find better agreement with numerical data, while 
for other combinations of {M, N, q} the hybrid approach 
fails. The collection of analytic approaches for the limits 
q = 0, < q <C 1, and q 1 are outlined in Sections IV 
A, B, and C, respectively. 

All of our analytical results were confirmed by our 
KMC simulations, from which we obtained first assembly 
times and related probability distribution functions. For 
certain choices of {M, N, q} the presence of traps could 
be indirectly inferred by the emergence of bimodal dis- 
tributions with very large first assembly times (on paths 
where traps were encountered) and very short ones (on 
others that were able to avoid them). These bimodal 
distributions may be smeared out for other choices of 
{M,N,q}. 

A number of additional stochastic properties of our 
self-assembly problem can be calculated. For example, 
one can derive analogous results for attachment and de- 
tachment rates pk and qk that depend on cluster size 
k. In particular, if we assume that binding and unbind- 
ing of monomers depends on the available surface area, 
and that clusters are of spherical shape, we can use the 
forms Pk,qk ~ fc 2 / 3 . Similarly, one could assume that 



stoichiometric limitations could exist so that attachment 
of monomers becomes progressively slower as completion 
of the 7V-mer is approached so that pk ~ (N — k) and 
qk ~ k. These extensions as well as the treatment of het- 
erogeneous nucleation and first "breakup" times will be 
considered in future work. 
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Appendix A: Calculation of Survival probability and 
moments 

To obtain expressions for moments of the first assembly 
time, it is useful to Laplace transform Eq.HTIso that 



G = 1 sS. 

Here, G and S are Laplace transforms of G and S, re- 
spectively. The vector 1 is the survival probability of any 
initial, nonabsorbing state, and consists of l's in a column 
of length given by the dimension of on the subspace of 
nonabsorbing states. Using this representation we may 
evaluate the mean assembly time for forming the first 
cluster of size N starting from the initial configuration 
{to} 



m rt » f°° dS({m}:t) , 

T N ({m}) = - / t U „. J ' > dt 

Jo 



dt 



(Al) 



S({m};t)dt = S({m};s = 0), 



Similarly, the variance Vjv({m}) of the first assembly 
time can be expressed as 



-/ ^ g({ ^ };t) dt-I*({ m }) 
tS({m};t)dt-T^({m}) 



= -2 



dS({m},s) 



5 2 ({to};0).(A2) 



s=0 



Upon Laplace-transforming S = A^S and applying the 
initial condition S({m}] t = 0) = 1, we find 



S = [si -At]- 1 !, 



(A3) 
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so that 



G = l-s[sl- A 1 ] -1 ].. 

The first assembly time starting from a specific configu- 
ration {m} is thus 



T N ({m}) = S({m};0) = -[(At)- 1 l] {m}J (A4) 

where the subscript {m} refers to the vector element cor- 
responding to the {m} initial configuration. Similar 
expressions can be found for the variance and other mo- 
ments. 

In order to invert the matrix A^ on the subspacc of 
non-absorbing states we first note that its dimension 
D(M 7 N) rapidly increases with M. In particular, wc 
find that the number of distinguishable configurations 
with no maximal cluster obeys the recursion 



[M/N] 

D(M,N + 1) = D ( M 



■jN,N), (A5) 



aL_! = (*-!)<&, 2<k<l 



M 

T 



i k,k 



(M -2k + 2)(M -2k+l) 
2 

(k- l)(M-2fc + 2)p 2 , 

1< k < H 



Pi - (k - l)q 2 



M 
T 



a 



(M-2fe + 2)(M-2fc + l) 



k,k+l 



-Pi, 



2<k<l 



M 

T 



where the first (second) index denotes the column (row) 
of the matrix. Using the above form for A^, we can 
now symbolically or numerically solve for the Laplace- 
transformed survival probability S({m}; s) and the mean 
self-assembly time S({m}; s = 0). 



where [M/N] denotes the integer part of M/N. For ex- 
ample, in Eq. lA5[ D(M, 2) = 1, and the only "surviving" 
configuration not to have reached at least one cluster 
of size k = 2 is (M,0). The next term is D(M,3) 
1 + [M/2] which, for M -)• oo yields D(M, 3) ~ M/2. 
Similarly, D{M, 4) can be written as 



[AT/3] 

D(M,4)= Y. D(M-3j,3) 



~M~ 




~ M~ 


Y 




~2 



M*_ 

6 : 



where the last two approximations are valid for large M. 
By induction, we find 



D(M,N) 



M 



N-2 



(N-iy. 



From these estimates, it is clear that the complexity of 
the eigenvalue problem in Eg . 1 A4I increases dramatically 
for large M. This enumeration of states and the associ- 
ated matrix method for computing first assembly times 
is analogous to the study of first passage times on a 
network^. However, rather than considering statistical 
properties of a scale free network, we are concerned with 
a probability flux across a specific realization of a state 
space network. 

We begin by studying the case of N = 3 for gen- 
eral M where instructive explicit solutions can be de- 
rived for the mean assembly times. In this case, the 
eigenvalue problem for the vector of survival probabilities 
S = (S(M, 0, 0; t),S(M - 2, 1, 0; t),S(M - 4, 2, 0; t), . . .) 
can be written using a tridiagonal transition matrix A^ 



whose elements 



4, 



"3,1 



take the form 



Appendix B: Calculation procedure for irreversible 
limit q = 

When q = 0, the matrix A^ now becomes bi-diagonal 
and a two-term recursion can be used to solve for the 
survival probability S(M — 2n,n,0;s) as follows. If the 
entries of the bidiagonal matrix A^ are denoted a\j , then 

the elements bij of the inverse matrix B = [si — A 1 *] 1 
are given by 



1 



if i > j, 

k,k> 



(Bl) 



if i <j 7 . 



a 



The Laplace-transformed survival probability, according 
to Eq. lA3l is the sum of entries of each row of [si — A^] 



S(M - 2n,n,0;s) = 



IT 



[M/2]+l 

- Y — 

j=i+l Ilk=i( S ~ a k,k) 



1-1 t 
k=i a k,k+l 



(B2) 

where i = n + 1 is the (n+ l) st row of [si — A^] . Upon 
performing the inverse Laplace transform of Eq. lB2l we 
can write the survival probability S(M — 2n,n,0;t) as 
a sum of exponentials and derive the full first assembly 
time distribution — dS(M — 2n, n, 0; t)/dt. Similarly, the 
mean first assembly time, according to Eq. lA4[ is T%{M — 
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2n, n, 0) = S(M — 2n,n, 0; s = 0). In particular, from and a = q N 2 + h.o.t. One can also calculate the vari- 
Eq. IB2I wc find ance of the first assembly time distribution to obtain 



a 



t 

k,k+l 



(M -2k + 2){M -2fc + l) 



*fc+i,fc+i 



(M-2fc)(M- 1) 



which, when inserted into Eq. IB2I with s 
Eq.[T3 



leads to 



Appendix C: Calculation procedure for fast 
detachment g > 1 

An estimate for the first assembly time distribution 
can be obtained within the dominant path assumption 
(Eq. I26p . By using the symmetry properties of the asso- 
ciated matrix we can find the Laplace transform of 
the first assembly time distribution G({M, 0, . . . , 0}; s)2£ 
in the q > M limit 



V N (M,0,...,0) 



nto 1 ^-*) 2 



2ft 



and similarly all other moments of the distribution. Fi- 
nally, we can also estimate the first assembly time dis- 
tribution G({M, . . . , 0},t) by considering the inverse 
Laplace transform of Eq. lCll specifically by evaluating 
the dominant poles associated to <f/v_i(s). In the large 
q limit, cIn-i(s) as evaluated via the recursion relations 
Eqs. lC2l can be approximated as 



JV-l 

d N - 1 (s)~q N - 2 s + -l[(M-i), 



G({M,0,...,0};s) 



(s) 



(CI) yielding the slowest decaying root A 



where djy-i(s) is a unitary polynomial of degree N — 1, 
given by the following recurrence 



s + 



M(M - 1) 



d 2 = (s + (M - 2) + q)di - q 



M(M - 1) 



(C2) 



di = (s + (M -i) + q)di-!-q(M -(i-l))di- 2 , 
for i > 2. 



Thus, d w -i(s) = s N - 1 + ... + fts 2 + as + \ W^iM - i). 



, N - 1 + ... + fts 2 
Note that the first assembly time is given by 



N 



N-l 



■2q 



i=0 



(C3) 



The above estimate allows us to write G({M, 0, . . . , 0}; t) 
in the large q limit as an exponential distribution with 
rate parameter Aj\r 



G({M,0,...,0};i) 



N-2 



I (M - i) (C4) 



i=0 



T N (M, 0,...,0) = lim 



1-G({M,0,...,0}; S ) 



By comparing Eq. lCll with Eq.[57]we note that the term 
a that appears in the above expansion for d/v-i(s), cor- 
responds to the quantity in the square brackets in Eq.l27l 
so that 



T N (M,0,...,0) 



2a 



Appendix D: Hybrid approximation for q ^> 1 and 

r <N-1 



A more general hybrid approximation can be imple- 
mented by assuming that only cluster of size r and 
smaller prc-equilibrate. We integrate Eq.|5]over all con- 
figurations but with n r+ i, ...tin fixed and obtain a reac- 
tion network for the remaining N — r clusters: 



(nin r |{n r+ i}) c q ^ {ni|{n r+ i})eqre r+ i^ (ni|{n r+ i}}eqnjv_i 

n r ^ n r+ i ^ n r+ 2 ■ ■ ■ > njv, 



(Dl) 



qn r+1 



qn r+2 



r 



where {n r+ i} = {n r+ i, ■ ■ ■ ,njy} so that (nin r |{n r+ i}) oq M — 2~2 r +i * n *> J us ^ as a bove for the choice r = N — 1. 
and (ni|{n r +i}} eq depend on the slowly varying mass, 
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In the reaction chain (|D1|) the last cluster size is 
treated as an absorbing state since we are only interested 
in the first assembly time, when tin = 1. The question 
still remains of properly evaluating (?iinjv-i|{"-r+i})cq 
and (ni|{n r+ i}) oq . For q — > oo it is reasonable to argue 
that that most of the mass is distributed among the fast 
clusters ni,...n r . Indeed, if we now assume that all the 
mass is contained in the fast cluster sizes, (m |{?V+i})cq 
and (nin r |{n r +i}) eq may be obtained via a distribution 
of r clusters with total mass M — Y^iLr+i — The 
rates in (|D1|) become independent of rij, for i > r. We can 
also drop the slow cluster size condition on the averaged 
quantities, and simply write (ui)jif and {n\n r )M- 

The cluster network in (jDlj) is a so-called linear Jack- 
son qucueing network^. Entry of particles in queue n r+ i 
occurs at rate (nin r )M, each of them moving indepen- 
dently according to the forward {ti\)m an d backward q 
transition rates. Starting with no particles in the queue 
at t = 0, the time-dependent probability distribution for 
this queueing network is well known^i. In particular, the 
number of particles in the last queue follows a Poisson 



distribution with mean 



(i(t) = {nin r ) M I V N -r{s)ds 
Jo 

where Vi(t) is the probability that a single particle is in 
the i th queue at time t after its entry in the system. Be- 
cause the last queue is absorbing, and from the definition 
of the first assembly time, the survival probability of our 
clustering process can be identified with the probability 
of having no particles in the last queue so that 



S{t) = Prob-jnjy = 0} 



exp 



(nm r ) 



P N - r (s) ds 



Finally, note the probability Vi(i), for 1 < i < N — r 
satisfies the Master equation Pi = AijVj with fj(0) = 5u 
and 



q - (ni)iif 
(tii)m 



M 



{ni)M -q - {ni)M 
(tvl)m 






0/ 



The first assembly time and the variance can now be 
derived according to standard formulae in Eq. lAll and 
Eq.EU 

As before, this technique requires an estimation of the 
first and second moments {tii)m and (?1i?v)m from the 
equilibrium distribution for clusters up to size r with to- 
tal mass M. A first crude way of approximating these 
asymptotic moments is to use the mean-field results 

/ \ C Q / \ eq eq 

(ni) M - Ci, (ninjv_i) M - c 1 4 c^_ 1 . 

We can also derive moment equations for (ti\)m and 
{n\n r )M directly from Eq.[5] Here, due to non-linear cou- 
plings between cluster sizes, the lower order moments will 
necessarily be described in terms of higher order ones. 
For instance, to determine the first and second moments 
we are interested in, we would need an expression for the 
third moment. To close moment equations, one usually 
assumes that the probability distribution for all cluster 
sizes obeys a certain form - either Gaussian, log-normal 
or negative binomial which are among the most standard. 



The third moment may then be written as a function of 
the first two, thus closing the system. The closed equa- 
tions of the first two moments become non-linear and a 
numerical solver is typically used to solve thern^. The 
case r — 2 has been extensively analyzed in^. In this 
paper we follow the same approach, using a Gaussian dis- 
tribution to approximate higher moments, thus deriving 
a closed system of r equations for (rii)M and r(r + l)/2 
equations for (niTij)M, where 1 < i, j < r. 

Finally, note that the hybrid approach described above 
is based on the assumption that all mass is initially con- 
tained within the first r clusters and are distributed ac- 
cording to the Becker-Doring equilibrium distribution. 
We expect this approach to be valid for moderate and 
large values of M and N, with q > M in order for the 
production of small clusters to be faster than the pro- 
duction of larger ones. How to choose the optimal cutoff 
value r is a delicate issue and depends on the specific pa- 
rameters {M, N, q}, although in general we find that all 
values oi2 < r < N — 2 give qualitatively similar results. 
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